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ABSTRACT 

We present a maximum likelihood method for fitting two-dimensional model distri- 
butions to stellar data in colour-magnitude space. This allows one to include (for 
example) binary stars in an isochronal population. The method also allows one to 
derive formal uncertainties for fitted parameters, and assess the likelihood that a good 
fit has been found. We use the method to derive an age of SS.Sj^g ^Myrs and a true 
distance modulus of 7.79lo;j!)5 mags from the V vs V — I diagram of NGC2547 (the 
uncertainties are 67 percent confidence limits, and the parameters are insensitive to 
the assumed binary fraction). These values are consistent with those previously de- 
termined from low-mass isochronal fitting, and are the first measurements to have 
statistically meaningful uncertainties. The age is also consistent with the lithium de- 
pletion age of NGC2547, and the HIPPARCOS distance to the cluster is consistent 
with our value. 

The method appears to be quite general and could be applied to any N-dimensional 
dataset, with uncertainties in each dimension. However, it is particularly useful when 
the data are sparse, in the sense that both the typical uncertainties for a datapoint 
and the size of structure in the function being fitted are small compared with the 
typical distance between datapoints. In this case binning the data will lose resolution, 
whilst the method presented here preserves it. 

Software im plementing the methods described in this paper is available from 
[http : / / www, astro .ex. ac .uk/people / timn / tau-squared /[ 
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- stars: fundamental parameters - open clusters and associations: general - open 
clusters and associations: individual: NGC2547 



1 INTRODUCTION 

The extraction of astrophysical parameters from colour- 
magnitude diagrams (CMDs), has been a crucial technique 
for astronomy since the discovery of the CMP as a diagnos - 
tic tool (almost certainly attributable to lHertzsprunsll91ll) . 
Since a coeval population of singe stars occupies a curve in a 
CMD, comparison with theoretical isochrones should allow a 
determination of global properties of the population such as 
age, distance and metallicity. Unfortunately such determi- 
nations have been hampered by the lack of good statistical 
methods for carrying out the comparison between observa- 
tion and theory. For galactic astronomy, the main technique 
has been a visual comparison of isochrones with the data 
(although more sophisticated methods have been used for 
resolved populations in other galaxies). This not only leads 
to questions of objectivity, but also makes it impossible to 
derive statistically meaningful uncertainties for parameter 
estimates. 



Were the problem simply fitting a set of datapoints 
with uncertainties in one dimension (say colour) to a curve 
then classical analysis would suffice. Unfortunately a 
datapoint in colour-magnitude space has uncertainties in 
both colour and magnitude. (In addition t he uncertainties 
are n ormally correlated, but as shown by 'Tol stov fc Sahal 
Il996l this can be overcome by transforming the prob- 
lem into a magnitude-magnitude space.) This problem 
can still be so lved analytically if the curve is actually 
a straight line ijNerit e^^l llQSSl and references therein). 
iFlannery fc JohngonTi^S^ extended this analytical ap- 
proach to the general case of a curve by a small curvature 
approximation. Their method has been use d on a significant 
volume of data, including globular clusters iBorissova et alJ 
0^997; DurrcU & Ha,rris 1993) single-age extra galactic popu- 
lations ( Geor giev et al.ll999l) and young (< l OMyr-old) pop- 
ulations (iTrullols fc Jordilll997l : IJordi et alil.1996) . None of 
these studies make significant use of the uncertainty mea- 
surements, partly because of systematics, but partly there 
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is also the comment that they produce shallow spaces 
jHeaslev fc ChristiaJl98^ which result in large derived un- 
certainties'' (|Nobl^^iLK,99ll ). This is clearly in part because 
the isochrones do not fit the data well, but may also be a 
warning that, although one can place clusters in an age se- 
quence by eye, the absolute values of the ages, which must 
be derived by comparison with the model isochrones, are not 

as precise as we might hope. 

There is a further limitation of the lFlannerv fc Johnson) 

il982l) technique, p o inted out most explicitly by 
iGaladi-Enriauez et alj lll99d) : no population of stars 
consists entirely of single stars. Unresolved binaries make 
up a significant fraction of most photometric samples, and 
are seen as objects which lie up to 0.75 mags brighter than 
the single star sequence. Indeed, in some coeval populations 
a distinct equal-mass binary sequence is observed 0.75 
mags above the single-star sequence, with unequal-mass 
binaries lying between the two. Whilst one may be able 
to extract a single-star sequence by eye and then fit it 
l|Holland fc Harrislll992h . clearly the best way is to fit the 
binaries as well. 

Thus one arrives at the fundamental question we ad- 
dress in this paper. If the model is a two dimensional dis- 
tribution in the colour-magnitude plane, can we derive a 
statistical test to fit the data to the model? There has been 
some interest in using Bayesi an methods to de termine the 
age of each star in a CMD jjgrgensen fc Limiegren 20051 
and references therei n), and then using the mean for the 
cluster age. Although Ivon Hippel et all i200(tl demonstrate 
such a technique for age determination, it is clear their work 
will be developed to fit other parameters as well. The prob- 
lem here, though, is the absence of a goodness-of-fit param- 
eter to choose between isochrones. Another obvious solu- 
tion is to bin the data into pixels, and compare this with a 
model. iDpluhin (1997) and ADaricio ct al. (1997|) have de- 
velo ped this techniq ue for large extra-galactic populations, 
with lDolphinI i2002l) bringing much of the literature together 
into a cohesive method. The problem is, however, that our 
data are often sparse, by which we mean the typical separa- 
tion of datapoints is large compared with their uncertainties 
(see Figure . Then binning the data simply has the effect 
of washing out our hard-wo n photometric precision. 

iTolstov fc Sahal il996l) developed a technique which 
does retain the datapoints as points, and which can been 
seen as a relative of the method we use here. They created 
simulated observations with a similar number of datapoints 
to the observed dataset, and then used the distances in 
colour-magnitude space between the points of the simulated 
and actual obser vations as a fitting stati stic. Our method, 
first presented in lNavlor fc Jeffrie j i20o'5l) . improves on this 
by using a quasi-continuous 2D distribution as the model, 
which overcomes problems of sampling the model into a fi- 
nite number of datapoints, and allows robust uncertainties 
to be derived. 

The method we are proposing is relatively intuitive, so 
rather than embarking first on a formal analytical proof, 
we first give the intuitive interpretation (Section |2l , and 
then discuss a numerical experiment which demonstrates 
the technique using a simulated observation, allowing us to 
conclude that it recovers the correct answer and uncertain- 
ties (Section!^. The formal proofs are in Sections 2] and |S| 
and the details of practical implementation in Section|S| We 
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Figure 1. A simulated observation of a 40 Myr-old cluster. See 
text for details. 




Figure 2. The expected distribution of datapoints underlying 
the simulation in Figure [Tl 



draw all the work together in an example using real data in 
Section |7| before reaching our conclusions in Section |5] An 
alternative to reading the paper in this order would be to 
gain a working understanding from Sections and|7| and 
try the worked examples available with the software from 
http:/ /www. astro. ex. ac.uk/people/timn/tau-squared. 



2 AN INTUITIVE INTERPRETATION 

Figure shows a simulated observa tion of 100 stars drawn 
from a 40 Myr isochrone from iD'Antona fc Mazzitelli 
1^23)! henceforth referred to as the DAM97 isochrones. 
As for all the isochrones used in this paper we have con- 
verted the isochrones from effective temperature to colour- 
magnitude space usin g the relationships derived from fit- 
ting the Pleiades f see I Jeffries et aLll200ll for details). The 
cluster is assumed to be unreddened and at a distance 
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modulus of zero. The underlying power law mass function 
(AN /AM oc M~^'^) has been chosen to give a reasonable 
spread of stars over the magnitude range chosen. We have 
assumed that 50 percent of the objects are unresolved bina- 
ries, but that there are no higher order multiples. Ignoring 
the higher order multiples should be a small effect since 
only about 5 percent of systems have more than two mem- 
bers iPuauennov fc Mavoil[r99lll . The masses of the sec- 
ondary stars for the binaries are uniformly distributed be- 
tween the primary star mass and the lowest mass available 
in the DAM97 models. This is equivalent to assuming the 
mass-ratio distribution is flat. Whilst there are many claims 
for structure in the distribution, after selection effects have 
been taken into account it is hard to arg ue that a flat distr i- 
bution is inconsistent with the data (e.g. lMazeh et all2003l) . 
In addition, as we shall show later the binary fraction, and 
by implication mass ratio distribution, has little effect on 
the parameters derived from the fits. The presence of a low- 
mass cut-off in the DAM97 isochrones leads to the empty 
wedge between the single star sequence and the more equal- 
mass binaries visible below y = 9 in Figure |21 Stars in the 
wedge would represent binaries where the secondary star lies 
below the lowest mass available in the models, and hence no 
stars can be placed in this region. We shall show in Section 
17.81 that the wedge has a negligible effect on our derived 
parameters. 

Figure |5| shows the same model, but this time used 
to generate many more objects, creating a surface density 
in colour-magnitude space. (For ease of display we have it 
renormalised such that the integral along each horizontal 
row is one, but will ignore this renormalisation in what fol- 
lows). Were there no uncertainties in each datapoint, the 
relative probability of there being a datapoint at some point 
i at {ci,mi) is simply the value of Figure|5|at (ci, rrii), which 
we refer to as p{ci,mi). Thus each datapoint has an asso- 
ciated value of p, and if we multiply all these together, the 
resulting product, D can be used as a fitting statistic. How- 
ever, in analogy with we use — 21nD, which we call r^. 
One can then consider moving the model around the plane in 
colour and magnitude (or perhaps distance and reddening), 
until the value of D is maximised, or is minimised. 

Introducing uncertainties for each datapoint does not 
have a large impact on the method. We introduce a two- 
dimensional uncertainty function for each datapoint, which 
we call Ui (for definiteness, one could consider this to be 
a two-dimensional Gaussian). One must now consider an 
uncertainty function centred at (ci,mi), and then integrate 
the product of U and p (the probability distribution of Fig- 
ure to obtain a probability Pi. We then calculate as 
— 2]^lnPi. In fact, probably the most difficult problem is 
introduced by the nature of the astronomical data; since the 
uncertainties in, say V and V — I are correlated, we must 
actually integrate under two dimensional Gaussians whose 
axis is skewed with respect to the colour-magnitude grid (see 
Section 15.211 . 

It should be obvious from the above that this is a max- 
imum likelihood method. As such it can be viewed as either 
Bayesian, or conventional frequentist statistics. As we dis- 
cussed in the introduction it can be viewed as generalising 
the method of lTolstov fc Salial lll996f) to a model which pro- 
vides a continuous distribution. As we shall show in Section 
15.11 it can also be viewed as a generalisation of . 
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Figure 3. The space resulting from fitting the simulated 
observation in Figure [Tl (see Section i3.3l . The values of are 
linearly scaled, and the white lines are contours at the 67 percent 
(r^ = 317.3) and 95 percent (r^ = 322.8) confidence levels. 

3 A NUMERICAL EXPERIMENT 

Our numerical experiment was to find the age and distance 
modulus of the artificial cluster described in Section |5| from 
the simulated observation we described. We followed the 
classical statistical path of finding the best fit to the data, 
and hence derived estimated parameters (Section 13.11 . We 
then assessed whether this was a good fit (Section 13.211 . and 
then on the assumption it was, derived uncertainties in our 
fitted parameters (Section 13.311 . 

3.1 Fitting and parameter estimation 

We compared our 100 datapoint simulated observation with 
a series of model distributions with ages around 40Myr. The 
model distributions we tested against used the same binary 
fraction (50 percent) as the original simulation, and the same 
uniform mass-ratio distribution. We could have also used the 
same mass function as we used for the simulation. However, 
to do so would make this a highly unrealistic simulation of 
fitting real data. In practice, for deriving ages and distances 
the mass function is a nuisance parameter. Whilst one may 
think that a simple power-law could be assumed over the 
mass-range of interest, this would then have to be convolved 
with the (often unknown) mass-dependent membership se- 
lection criteria. For example, in Section|7|we shall use an X- 
ray selected sample to determine the age of NGC2547, and 
the precise effect of X-ray selection is unclear. We therefore 
normalise our model distributions to have a constant num- 
ber of stars per unit magnitude (e.g. Figure We refer to 
this procedure as "normalising-out" the mass-function, and 
will discuss its implications in detail in Section 16.3.31 For 
datasets with well understood membership selection criteria 
our procedures can, in principle be simplified by removing 
the normalising-out of the mass-function, allowing the mass 
function to be derived as well. 

We tested several different offsets in magnitude for each 
age, which yielded the space shown in Figure |3 There 
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Figure 4. The smooth curve is the expected distribution of the 
probability of obtaining a given calculated for the simulated 
observation (using the best fit model) in the way described in 
Section lOm The histo gram is the distribution of obtained by 
fitting 100 further simulated observations generated in the same 
was as the first. 

is a minimum at 42.5Myr, a distance modulus of -0.0195 
and T =311.9, which is close to the values of 40Myr and 
0.0 mags of the artificial cluster from which the simulated 
observation was drawn. 



3.2 Goodness-of-fit 

In the case of fitting one uses the F-test, which in essence 
is a prediction of the cumulative distribution of x^- We re- 
ject fits with -Pr(x^) below some critical value, e.g. 5 per- 
cent. (We use Pr{x) throughout the paper to signify the 
probability that a statistic exceeds the value x. The sub- 
script r differentiates it from P, the probability density in 
the colour-magnitude plane after applying the uncertainty 
function.) It turns out that numerical integration allows us 
to predict, after the fit is complete, the expected distribu- 
tion of (Section I6.3.H . and thus assess the goodness of 
fit. Such a distribution is the smooth curve in Figure 2] For 
the example given we expect our value of — 311.9 to be 
exceeded in 34 percent of fits. We can also normalise in 
a similar way to (for a large number of degrees of free- 
dom) by dividing by the value we expect to be exceeded 50 
percent of the time, which in this case gives a reduced r^, 
of = 1.02. 

We can check that this is correct by creating a fur- 
ther 100 simulated observations, and examining the range 
of this produces. Figure |1] shows (as a histogram) the 
distribution of for from the 100 simulations. The simula- 
tions suggest that 21 percent of observations would exceed 
= 311.9; clearly smaller than the 35 percent our theory 
yields. The reason for the difference is our normalising out 
of the mass-function (see Section r6.3.3t . Despite this (which 
is a fundamental limit of the data, not of the test), our 
method of calculating is good enough to show that the 
fit is good, and of course relative values remain useful for 
testing different models. 



3.3 Uncertainties for the parameters 

The simplest method of estimating the uncertainties would 
be to create simulated datasets starting with the parame- 
ters derived from the observation. However, our normalising 
out of the mass-function precludes us from doing this. We 
therefore produce bootstrap datasets by moving each data- 
point at constant magnitude onto the best-fit isochrone, and 
then adding to the two magnitudes a random offset drawn 
from a population with the appropriate Gaussian distribu- 
tion for the error bars associated with the datapoint. Since 
there is not a unique isochronal colour associated with each 
magnitude (because of the effects of binarism), we have to 
assign the datapoint to a position in colour using the rela- 
tive likelihood of each colour drawn from the model. Hence 
we have assumed that the probability of any given combina- 
tion of parameters being the correct one is identical to the 
probability of obtaining those parameters if the underlying 
model was actually the best-fitting model. We then make 
100 bootstrap datasets, and examine the resulting values 
of the derived parameters, using the RMS about the mean 
value as the uncertainty. This gives uncertainties in distance 
modulus and age of 0.012 mags and 1.1 Myr respectively. 

We can test these estimates of the uncertainties using 
the 100 simulated observations we created for Section f3.2l ^ 
These give a scatter in distance modulus and age of 0.011 
mags and 0.9 Myr, in good agreement with our bootstrap 
method for determining the uncertainties. 

In practice, we are interested in more than the sim- 
ple uncertainties, as there is a correlation between distance 
modulus and age. We deal with this in an analogous way 
to fitting by drawing a contour in the space which 
encloses a given fraction of the probability of where the so- 
lution lies. We take the values of the distance modulus and 
age derived from each bootstrap dataset, and find the cor- 
responding value of in our space derived from the first 
simulated observation (Figure]^. (Not the value of given 
by the fit to the bootstrap datasets.) This allows us to draw 
the contour at constant (317.3) which encloses 67 per- 
cent of the derived values, i.e. a "Ict" confidence limit. This 
is plotted in Figure |H] and shows the expected correlation 
between age and distance modulus. ^ 

Again we can test this using our simulated observations 
from Section f3. 21 We take the values of the distance modu- 
lus and age derived from each simulation, and find the val- 
ues associated with them from the space derived from 
the first simulated observation. 67 percent of them lie below 
r^=317.9. Given that we have 100 simulations, we actually 



^ The situation becomes unavoidably confusing at his point, as 
we now have two simulated groups of observations, each of 100 
realisations. We refer to the 100 simulated observations created 
for Section lij. 21 in the same way as our original simulated obser- 
vation, as "simulated observations". The 100 simulated datasets 
created in Section lij.3l which in a real case would be derived from 
the observation by forcing all the data back onto the isochrone 
and then scattering them according to their uncertainties we refer 
to as "bootstrap datasets". 

^ Note that the t'^ for the 67 percent confidence contour is not 
given by 1 — Pr(T'^) = 0.67. The analogous case for is that for 
one free parameter one uses the minimum + 1 a^s the confidence 
contour. 
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require the below which 67 percent of some large par- 
ent population lies, which we estimate is between 317.2 and 
318.1. This range which includes the value derived using our 
proposed technique, implying that technique is correct. 

We also tried using a m ore traditional bootstrap method 
fe.g. lWaU fc Jenkinj2003) . For such a bootstrap to work the 
values of the datapoints (or in our case the values calculated 
from them) mu st be ide ntically distributed (see for example 
Section 15.6 of |Press"et aL,.1992'l . It is quite clear that the 
ages and distance moduli derived from each datapoint are 
not identically distributed, but the traditional bootstrap of- 
ten works sufficiently accurately even when this assumption 
is quite strongly violated. To see if this was the case, we 
created 100 new data sets by randomly selecting 100 points 
from the original data. (Thus, as is normal in such a boot- 
strap method, a significant fraction of the datapoints in each 
realisation are the same.) We found the RMS for each pa- 
rameter using this dataset, which yields uncertainties of 1.2 
Myr and 0.011 mags. Again, these are consistent with those 
calculated using our method. However, we found that the 
suggested 67 percent confidence contour for is too low 
(314.7). To check that this failure of the traditional boot- 
strap was not due to our normalising out the mass function 
we performed a similar simulation using models which re- 
tained the mass function. Again we found our bootstrap 
gave a similar confidence interval to that implied by many 
simulated observations of the same "cluster", but in this 
case the traditional bootstrap overestimated the uncertain- 
ties. Clearly the derived parameters from each datapoint are 
not sufficiently close to identically distributed for the tradi- 
tional bootstrap to work. 



3.4 Conclusion 

In this section we have validated the test by simulating 
a dataset and recovering the original parameters. We have 
also shown that we can estimate reliable uncertainties in the 
measured parameters by creating bootstrap datasets. The 
"base" for the simulations is created by moving each point in 
colour space until it lies on the isochrone. By examining the 
range of the values of the parameters derived from these 
datasets, we can estimate confidence intervals analogous to 
those used in analysis. 
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Figure 5. A schematic showing a sequence S, an observed dat- 
apoint i and a point M which may be its position unperturbed 
by observational error. 



that model is, say, a (5-function at (c, m), then the prob- 
ability that our data originates from the model is simply 
the integral of the product of the 5- function and Ui. More 
generally, the likelihood for any given datapoint i is given 
by 



Pi 



Ui{c — Ci,m — mi)p(c, m)dc dm. 



(1) 



If there are A'' datapoints, the likelihood that the whole dis- 
tribution originates from the model is the product of the 
probabilities for each point. 



m — mi)p{c,m)dc dm (2) 



1=1, AT i=l,]V 
_2 



If we now define r as -21nD, then we arrive at the formal 
definition of r'^, 



— —2 In / Ui(c — Ci,m — mi)p(c,m)dc dm. (3) 

1=1, iV 



4 FORMAL DEFINITION 

Having shown by numerical experiment that t'^ can work, we 
must now put it on a formal mathematical footing. Figure|3 
shows a colour magnitude plane, with a sequence S and an 
observed datapoint i at {ci,mi). The datapoint will have an 
associated two-dimensional probability distribution, which 
we will assume is Gaussian. This allows us to calculate the 
probability that the true values of c and m lie within any 
specified range. Thus if the datapoint lies at {ci,mi), the 
probability that the true value lies within an elemental box 
of area dc dm about AI at (c, m) can be written as Ui{c — 
Ci,mi — m)dc dm, where [/ is a 2D function which represents 
the uncertainty for a given datapoint. 

We now assume that we have a model p which predicts 
the true density of stars in the colour-magnitude plane. If 



For most practical applications Ui has Gaussian uncertain- 
ties and is given by 



Ui{c — Ci,m — mi) = e 







(4) 



where and are the uncertainties in each measure- 
ment. 

There are two obvious interpretations of equation|3| The 
first is that one has simply taken the model and blurred it by 
the uncertainties in each datapoint. The likelihood is then 
simply the product of the values of the model at each point. 
Alternatively, we have integrated model probability under 
the 2D Gaussian uncertainty surface. In either interpretation 
the process of maximising this function to obtain the best 
fit is analogous to maximising the cross correlation function, 
though one uses the product rather than the sum of the 
individual pixel values. 
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5 SPECIAL CASES 

Before using our full two dimensional implementation of 
it is useful to reduce EquationElfor three special cases. These 
show how (i) is related to x^; (ii) that it gives the standard 
form for fitting a straight line to data with uncertainties 
in two dimensions; and (iii) th at it can also reduce to the 
same approximation as that of iFlannerv fc JohnsonI lll982ll 
for curve fitting with uncertainties in two dimensions. 



5.1 Curve fitting for data with one dimensional 
uncertainties 

The most important special case to derive is that for when 
the model predicts that a point whose true value is (c, m) 
should always have an observed value of d = c, but has a 
range of possible observed values rrii , represented by a Gaus- 
sian probability distribution. In this case should behave 
like x^- Removing the dependence on c from Equations |H] 
and 13] yields 



—2 5: in 

i=l,N 



p(m)Am. 



(5) 



Further, for any single datapoint p(m) is a 5 function centred 
on m, and with a normalisation we choose to be one, thus 



= -2 



(m ^ 



i-l,JV 



(6) 



This is the standard form for x fitting to a function with 
uncertainties in one dimension. Thus we have shown that 
is a special case of r^, where the model is a curve and the 
data have uncertainties in one dimension. 



5.2 A linear isochrone 

We now wish to examine the special case where the prob- 
ability distribution is a linear sequence, but the data now 
have uncertainties in both co-ordinates. We have three aims 
in presenting this special case. First, to show that the stan- 
dard form for fitting a straight line with uncertainties in two 
dimensions is a special case of r^. Second, we will test our 
(numerical) implementation of t"^ by checking it recovers the 
same answer as the analytical expression. We find that if this 
is to be the case we must use the correct normalisation for 
p, which we derive below. Finally, there is an intuitive in- 
terpretation of the analytical expression which is useful for 
interpreting the more general case of fitting a curve with 
uncertainties in both dimensions. 



5.2.1 Analytical form 

Formally we wish to assess the probability that a point i at 
{ci,mi) originates from the isochrone 

dm 



dc 



c + k, 



(7) 



where fc is a numerical constant. We begin by changing to 
a co-ordinate system {x,y), a process shown graphically in 
Figure |S| We first normalise by the uncertainties in each 
axis, then place (ci, rrii) at the origin, and finally rotate the 



system through an angle 6i such that the x-axis lies parallel 
to the sequence. (We use the subscript i to emphasize that 6 
depends on the uncertainties and so is potentially different 
for each datapoint.) Thus 



m — nii 

C — Ci 



= ycosOi + xsinOi, 
= xcosOi — ysinOi. 
Equation Qthen becomes 

2 p(x,y)dxdy. 



(8) 
(9) 

(10) 



In this co-ordinate system we denote the j/-distance between 
the a;-axis and the sequence j/o- We can now divide p{x,y) 
into p{x)p{y) where p{x) is constant and p{y) = except 
where y — yo. This allows us to separate the integrals, and 
find that 



(11) 
(12) 
(13) 



Now p{x) J p{y)dy is the number of objects per unit length 
in x, and in terms of the number of objects per unit magni- 
tude, p(m), is amiSmdip{m), thus 



Pi = y e 2 g 2 p(^x)p(y)dxdy 

_ii f _x2 f 

= e 2 p(2;) / e T-ds / p{y)dy 

I — -2a ( 
= y/2Tie 2 p{x) I p{y)dy. 



Pi = V2Tve 2 am,smeip{m). (14) 
Thus, Equation |3| becomes 

= ^ 2/0^2 ^ ln{a,niSm6ip{m)V2TT). (15) 



i = l,JV 



1=1, AT 



5.2.2 Intuitive interpretation 

This equation has an intuitive interpretation, which is espe- 
cially useful for what follows. The probability that a star at 
{ci,mi) originates from a given point on an isochrone is given 
by the probability that there is a datapoint whose true value 
lies at that point on the isochrone, multiplied by the prob- 
ability that the uncertainties could move it to {a, mi). For 
the whole isochrone, therefore, the probability that it will 
yield a point at {a, mi) is given by the line integral along 
the isochrone, multiplied at each point by the probability 
of it being moved to {ci,mi). This probability distribution 
is (in normalised units) simply a two-dimensional Gaussian 
distribution centred on (a, mi). Any linear cut through such 
a 2D Gaussian, such as that made by the isochrone, is it- 

self a ID Gaussian, but with is peak reduced by e t with 
respect to the 2D distribution. Thus the integral along the 
line is the integral under this ID Gaussian. The ratio of the 
integrals under ID and 2D Gaussians of equal peak height 
is V2tt, but this must also be multiplied by the decrease in 

_yl 

peak, e 2 ^ explaining the form of Equation 1131 
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Figure 6. A linear isochrone which makes an angle 9 with 
axes normalised by the uncertainties in each dimension. The x 
and j/-axes define a rotated co-ordinate system parallel with the 
isochrone, centred on the datapoint (c^, mi). Note magnitude axis 
is reversed. 



■ — 4- h 



dm 
dc 



(20) 



This is the standard expression to be minimised for fitting a 
straight lino to data witli uncertainties in both co-ordinates 
(e.g. Nerit et al..f989.') . wliicli demonstrates that such fitting 
is a special case of . 

5.3 A real isochrone 

We can use the interpretation of Equation 1151 presented in 
Section 15.2.21 to visualize the limit in which the approxi- 
mation that the isochrone is linear is no longer valid. Once 
the curvature of the isochrone becomes large compared with 
the typical uncertainties for a datapoint, then it cannot be 
approximated to a straight line when the line integral is per- 
formed. However, for the case where the curvature is small, 
one might still be able to use Eauation ll5l interpreting yo as 
the distance of closest approach of the line to the datapoint, 
and 6i referring to the gradient of the isochrone at closest ap- 
proach. Although a rather different derivation, such a tech- 
nique would be identical, sav e some normalisation factor s, 
to the near-point estimator of iFlannerv fc JohnsonI I^S^)- 



5.2.3 Testing the linear isochrone 

Equation 1151 gives us a practicable way of fitting a linear 
isochrone, by minimising as a function of j/o and the gra- 
dient of the isochrone (which is related to 6i). First, if we 
wish to reduce to we must choose the normalisation of 
p in Eauation ll5l Since yo is distributed as a Gaussian with 
a standard deviation of one, this means we must ensure the 
second term is zero. Thus 

^ ln(cr„,sin6',^)+ ^ ln(p(m)) = 0, (16) 

i = l,JV i = l,JV 

giving 

p(m)-'^ = {Grr^,sn^e,^/2^). (17) 

i = l,JV 

From Figure |S| it is clear that 6 is related to the gradient of 
the isochrone by 

^ ^ ^tane.. (18) 
dc cjci 

The value of yo can be found using the above equation, and 
setting x=G in Equations |H| and |^ and substituting into 
Equation Q to obtain 

j/o = -^sin^i + — — ^cos^i. (19) 

For a given linear isochrone and set of simulated datapoints 
this means we can calculate analytically a value for . We 
can then use this to test the 2D numerical code we describe 
below. 

5.2.^ Comparison with a straight line fit with 2D 
uncertainties 

Clearly the best-fitting straight line will be obtained by min- 
imising the sum over all datapoints of j/q in Eauation ll9l We 
can rewrite the equation such that 



6 THE TWO DIMENSIONAL APPROACH 
6.1 Implementation 

We can gain our first insights into the 2D case by reproduc- 
ing the results from the ID-linear and ID-real isochrones of 
Sections 15. 21 and 15. 31 using the 2D algorithm. 

We evaluate the integral in Equation lousing a 2D grid. 
We represent p{c, m) as a grid and, for reasons we will dis- 
cuss later, populate this grid by a Monte Carlo method. 
For these ID isochrones we begin by randomly selecting a 
magnitude, and then assigning a colour according to the 
isochrone. The value of the appropriate pixel of p(c, m) is 
then incremented by one. At the end of the Monte Carlo 
we then ensure that p(m) is one by dividing each pixel by 
the sum of all pixels at that magnitude. This means that 
in practice the initial distribution in magnitude used by the 
Monte Carlo is unimportant, provided it is smooth. 

For each datapoint we can now evaluate Equation 
We multiply each pixel of p{c,m) by Equation |1] In princi- 
ple, before using p we should correct it by the normalisation 
given in Equation 1171 In practice it is simpler to apply a 
correction to the p used for each datapoint, which when the 
probabilities for each datapoint are multiplied together gives 
the same effect. Thus for each datapoint we divide p by the 
normalisation factor (Tm;Sin^i%/27r. To evaluate sin^i we re- 
quire the gradient at each pixel, which we evaluate and store 
at the same time as we calculate p(c, m), by differencing the 
c and m values of the most extreme valued points from the 
Monte Carlo which lie within the pixel. Of course the gra- 
dient is only defined on the isochrone, and we need it for a 
general point in the CMD. We can be arbitrary about how 
we make this generalisation, since our normalisation is only 
there to ensure that if we have a straight line (where the 
gradient is obviously always the same) we obtain a of one 
per datapoint. We therefore choose the gradient for an arbi- 
trary pixel to be that of the isochrone at the magnitude of 
the pixel. At this point one can test the code is functioning 
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correctly by using a linear isochrone and testing the result 
for against the analytical result given in Equation 1151 

To move to the more general 2D case one fills the array 
for p by selecting stars randomly according to some mass 
function. They are assigned companions (or not) according 
to the preferred binary frequency and mass functions, and 
then one uses isochrones to place the resulting systems in 
colour-magnitude space. The remainder of the procedure is 
as before for the linear isochrone. 

6.2 Correlated Uncertainties 

A significant issue with any CMD is that the uncertain- 
ties are correlated, since a change in, say V also re- 
sults in a change in V — I. Perhaps the most obvious 
change in forma l ism t o deal with this is that suggested by 
iTolstov fc Sahal il99(jl . where the actual fitting is carried 
out in a magnitude-magnitude space. We have found it sim- 
pler (and therefore more robust against coding errors) to 
use colour-magnitude space throughout our code. However, 
at the point of evaluating Equation 2] one can calculate the 
argument of the exponential in magnitude-magnitude space, 
reconstructing the uncertainty in the second magnitude us- 
ing the uncertainties in magnitude and colour. In principle 
this allows considerable fiexibility, including the ability to 
deal correctly with data which have been created using a 
colour term in the transformation from instrumental to ap- 
parent magnitude, and a co-efficient other than unity in the 
transformation from instrumental to apparent colour. 

6.3 The distributions of 

Once we have fitted our data, to calculate whether it is a 
good fit we need to know Pr{r'^). To calculate this we must 
first calculate the distribution for a single point, and then 
calculate the expected distribution for the whole ensemble 
of datapoints. 

6. 3. 1 The distribution for one datapoint 

To understand the form of the distribution it is useful 
to begin by considering a classical fitting problem, but 
solved as though it were suitable for r^. In such a problem 
the model isochrone is a curve in colour magnitude space, 
and the uncertainties are treated as 2D Gaussians which are 
infinitely thin in the colour dimension, and have the correct 
width in magnitude space to represent the ID uncertainty. 
We can calculate the chance that a star at a given point on 
the sequence actually appears, due to observational error, 
at a given position on the CMD. If we integrate this along 
the entire sequence we obtain the probability of there be- 
ing a datapoint at any given position on the CMD. Since 
our uncertainties are Gaussian, and the line is a form of 
(5-function, the distribution of probabilities in the plane is 
itself Gaussian. Thus, the likelihood of finding a datapoint 
at given probability, say P, is proportional to the fraction of 
the CMD covered by pixels with that probability, multiplied 
by P. Assuming all pixels have the same area, this can be 
calculated numerically by summing the values of all pixels 
for which P lies within a given (infinitesimal) range. Strictly 
speaking this should only be interpreted in the cumulative 




Figure 7. A model created using the DAM97 isochrones, and 
a binary fraction of 0.5. The uncertainties used are 0.01 mag in 
each filter. 



sense, i.e. that the probability of finding a datapoint with 
a probability of P or less is proportional to the fraction of 
the area covered by each probability less than P, multiplied 
by that probability, and then integrated over all probabili- 
ties less than P. We, of course, have chosen not to work in 
probability P, but in x = — 21nP. Thus we have not quoted 
the chance of a datapoint being at a probability P or less, 
rather the chance of it lying at a given ot more. 

Of course when we perform this sum over the plane, the 
resulting distribution will be that of x^ ■ We can still retain 
a x^ distribution in the plane if we make the uncertainties 
two dimensional, provided we restrict the isochrone to be 
a straight line. But if the model is to be a curve, and/or 
include binaries, the distribution of probability in the plane 
will no longer be Gaussian, and the probability of exceed- 
ing a given value will no longer behave like x^- We can still 
accurately predict the distribution of values of — 21nP we 
expect to get. That is obtained by simply creating a his- 
togram of the probabilities from an image such as that in 
Figure |7| But, this will no longer be distributed as x^, and 
to emphasize this fact we will now refer to — 21nP as r^. 

In Figure |H] we show the cumulative distribution for the 
value of taken from Figure |7] where the uncertainties are 
0.01 magnitudes in each filter. When compared with the x^ 
distribution for one degree of freedom, there are two major 
differences between r^{a = 0.01) and x'^ ■ First r^{a = 0.01) 
has no values below about 1, and second it falls much more 
slowly. The slow fall is the effect of the "plateau" region 
between the single and binary star sequences, which con- 
tributes a large area of low probability, and hence high r^. 
The absence of values below about one is the result of our 
requirement that p{m, c) at a given magnitude integrates 
to one over all colours. This imposes a maximum value on 
P, and hence a minimum on r^. As one moves to larger 
uncertainties (t^(ct = 0.1) in Figure |HJ, these differences 
become less pronounced. The change from T^{a — 0.01) to 
r^{a = 0.1) shows that as the uncertainties become larger, 
tends to x^- The reason for this is clear if one compares 
Figure |7| with Figure |5| As the uncertainties become large 
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Figure 8. The distribution (the probability of exceeding a 
given from the DAM97 isochrones using two different values 
of the uncertainty. A distribution for one degree of freedom is 
shown for comparison. 




Figure 9. A model created using the DAM97 isochrones, and 
a binary fraction of 0.5. The uncertainties used are 0.1 mag in 
each filter. The feature at bright magnitudes is caused by the 
upper cut-off in mass for the DAM isochrones. At this cut-off 
the binary sequence rises to brighter magnitudes than the single 
star sequence. When the single-star sequence ends, the smoothed 
sequence moves redwards. 

compared with the distance between the single and binary 
star sequences, we can approximate them to a single se- 
quence. 

6.3.2 The distribution for many datapoints 

Having calculated for a single datapoint, it would appear 
straightforward to calculate it for an ensemble. We will do 
this by comparison with the case for x^- 

The standard proof for the distribution for many 
datapoints is a generalization of the proof for just two (e.g. 
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Figure 10. The two-dimensional differential probability distri- 
bution of X- The contours are evenly spaced starting at a proba- 
bility of zero. Note that the axes are in x not x^- 
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Figure 11. A two-dimensional differential probability distribu- 
tion of T for the DAM97 isochrones and uncertainties of 0.01 mag 
(horizontal axis) and 0.1 mag (vertical axis). The contours are 
evenly spaced starting at a probability of zero. Note that the 
axes are in r not t^. 

ISahall99^ . One considers a two-dimensional space, with xi 
(not Xi) s-s one axis and X2 as the other. At each point in 
the space one evaluates the probability of obtaining simul- 
taneously values of xi between xi and xi + d-Xi and of X2 
between X2 and xa + ^Xa- This probability is simply ^^^^) 
or in more familiar terms of the differential probability dis- 
tribution of x^, 4xiX2 4t-4t-- This function is plotted in 

Figure [Till using -Pr(x^) for one degree of freedom. The fig- 
ure shows that the probability of obtaining any given value 
of X = Xi + X2 is independent of the value of either Xi 
or xi- This allows the proof for x^ to proceed to its con- 
clusion that the probability of obtaining any given value of 
X^ is proportional to xi times the length of the arc at a 

radius x^ (or more generally the area of the N-dimensional 
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surface). The crucial point here is the interpretation that 
the probability of obtaining a given is simply the line 
integral of the probability along a line of constant 'x^ . For 
-X^ the integral can be performed analytically, because the 
probability is the same along a line of constant x^; this is 
not the case for r^, and in this case the integral must be 
evaluated numerically. 

Figure 1111 shows the equivalent plot to Figure 1101 but 
instead of = X1+X2 we have = r^(cr = 0.01) + r^((T = 
0.1), for the DAM97 isochrones. Before embarking on how 
to use this plot to determine the probability of obtaining a 
given or greater, it is useful to understand the differences 
between Figures 1101 and 1111 The most likely value of x is 
zero, simply because the most probable position for a data- 
point to lie at its value before perturbation by observational 
error. For r this is not the case. At any given value of (say) 
V , there are a range of actual V — I values it could have 
originated from. Furthermore, the large area of the CMD 
covered by binaries (albeit at a low probability), gives a very 
large chance that a star will yield a high r. This point can 
be emphasised in two ways. First, collapsing the plot onto 
the y-axis gives the differential version of the upper curve in 
Figure|Sl with its emphasis on high values of t^. Second, col- 
lapsing the curve onto the x-axis yields a distribution more 
strongly skewed to low values, as one would expect because 
the larger value of a causes the distribution to tend towards 
that for x^- 

The problem with Figure 1111 from the point-of-view 
of evaluating Pr{T'^) is that ^^^^^^ along a line of con- 
stant is not independent of either Prij^ia = 0.01)) or 
Pr{T^{o- = 0.1)). This precludes us using the analytical x - 
method to evaluate Pr(T^). However, this does not stop us 
undertaking a numerical line integration along fixed curves 
of to evaluate the probability of exceeding that value of 
T^. The route we have followed to perform this numerical 
integration relies on the fact that the arc length is propor- 
tional to the number of pixels. One can calculate a grid of 
the differential probability (i.e. the probability of obtaining 
a certain r^, not of exceeding it), akin to Figure ITTI by sim- 
ply multiplying the two differential distributions together. A 
simple histogram of the number of pixels with a given value 
of is then . Unfortunately, when one generalizes this to 
say, the 100 dimensions needed for a 100 point dataset, the 
calculation becomes intractable in reasonable computation 
times. We therefore perform the calculation dimension by 
dimension. We take the first two t'^ distributions, and mul- 
tiply each point in one distribution by every other point in 
the other. We then bin the result into bins of = Ti + r| 
to produce a new, one dimensional distribution. This can 
then be multiplied by the next dimension, and the process 
repeated until all dimensions have been allowed for. We then 
integrate this to change from a differential to a cumulative 
distribution. 



6.3.3 and practicalities 

The method described thus far is very general, with little 
tailoring to the specific problems of CMDs. In calculating 
the expected distribution of however, we must return to 
the subject of normalising-out the mass function, a proce- 
dure first discussed in Section f3.1l If the model for Figure |5| 



included a mass function with increasing numbers of stars 
at fainter magnitudes, we would expect to see a much higher 
probability density in the bottom part of the plot than in 
the top part. Since we expect the majority of our datapoints 
to lie at faint magnitudes, this is perfectly correct. The best 
values will be found by placing the majority of the points 
in the regions of highest probability density; thus the mass 
function is a driving force in the fitting procedure. Note, 
however, that the distribution of is different for bright 
and faint magnitudes, due largely to the change in slope of 
the pre-main-sequence. This means that the distribution of 
for a single datapoint is different for different mass func- 
tions. For the observational reasons explained in Section FS.ll 
it is not desirable to introduce the mass function as a set 
of free parameters, and so we have normalised-out the mass 
function in our models by setting the integral of p{m, c) over 
all colours at a given magnitude to one. This has the addi- 
tional advantage that the distribution of reduces to that 
for x^ for data with uncertainties in two dimensions fitted 
to a straight line (Section |KJ. 

Given that we are not interested in determining the 
mass function, just the age and distance of clusters, how are 
we to calculate a distribution in the case of a normalised- 
out mass function? Our method is to calculate the distri- 
bution of for each data point using only the region of 
the CMD within ±3(Tm of its measured magnitude. Thus 
the ensemble of individual probability distributions, and 
hence the distribution for the fit as a whole reflects the ac- 
tual distribution of datapoints in 1/-band magnitude. This 
has the incidental advantage of greatly speeding the calcu- 
lation, the limiting factor being smoothing the image by the 
uncertainty for the datapoint, for which the run-time scales 
linearly with the magnitude range used. 

This "bootstrapping" of the mass function leads to a 
fundamental limit on how well we can predict Pr(T^). Con- 
sider the hundred simulated observations of Section r3.2l Be- 
cause the datapoints are all slightly different, then for each 
dataset we have a different prediction for the distribution 
Pr(T^). This situation is illustrated in Figure WI\ where the 
solid curve shows the mean of all 100 predictions. To assess 
the range of predictions we sorted the distributions by the 
value of at a probability of 0.5, and then plotted (as a 
dotted lines) the distributions that enclosed in middle 50 
percent of these values, i.e. the 25th to the 75th percentiles 
of the distribution. These cover a range of about 1 percent 
in r^. This indicates that the prediction for Pr(r^) from a 
single observation (such as the solid line in Figure 2J is un- 
certain at the ±1 percent level in which corresponds ±0.1 
in Pr{r^). 

There is a final complication which adds a further un- 
certainty to the absolute value of r^. The method outlined 
above only calculated the distribution Pr(T'^) for matching 
the data directly to a model, with no free parameters. In the 
X^ case, for large values of the number of degrees of freedom 
(i.e. the number of free parameters n subtracted from the 
number of datapoints A*'), Pi (x^) scales with the number of 
degrees of freedom. This implies that we need to multiply 
our Pr{T^) by {N — n) /N. We have no formal proof for this, 
but the following numerical experiment supports this view. 
If one takes a simulated observation and compares it with 
the underlying model one obtains a value for r^. If it is now 
compared with a grid of models with a range of distance 
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Figure 12. The distribution of obtained by fitting 100 sim- 
ulated observations (histogram), compared with the mean of the 
predictions for the distribution of for each dataset (line). The 
dotted lines enclose the 50 percent of the predictions with val- 
ues at Pj.(t^) = 0.5 closest to the mean, i.e. the 25th to the 75th 
percentiles of the distribution. The histogram is the same as that 
in Figure l4l 

moduli and ages, the best fitting model will have a smaller 
value of t'^. Over many realisations we find the mean change 
is a factor of (A^ — n) /TV. 

In summary, therefore, we calculate the expected dis- 
tribution of by first considering one datapoint at a time, 
after the fitting process is complete. We smooth the best 
fitting distribution in colour-magnitude space according to 
the uncertainties for that point, and then extract the distri- 
bution of probability P as a function — — 21nP. We then 
multiply all the distributions together, using the method de- 
scribed above, to find the expected distribution of for our 
dataset. Finally, we can divide our fitted value of (and 
the values of t'^ in P^Ct^)) by the expected value of at 
Pr(T^)=0.5. In analogy with xi this gives us r^, that has an 
expected value of unity for a good fit. 



it is blurred by the uncertainties and becomes P. Practi- 
cally however, once one is a few a from the sequence nu- 
merical rounding ensures that taking the logarithm of this 
probability causes a numerical error. The underlying philo- 
sophical issue here is that these datapoints are probably not 
described by our model (they are background or foreground 
contamination) and at some point these data should be re- 
moved from the fitting process. The classical way of dealing 
with such a situation is an A'^cr clipping scheme, removing 
datapoints from the calculation of r'^ once they lie at very 
low probabilities {Na from the expected value). Simple clip- 
ping would be a recipe for numerical instability, so instead 
we use a soft clipping scheme. To achieve this we simply add 
a small probability (e""'^^^") to Pi for each datapoint, the 
value we use amounting to a maximum of 20 for each 
datapoint. We then search for the minimum in space us- 
ing the full dataset, but once the best fit has been found, we 
clip out all the datapoints whose exceeds half the maxi- 
mum set. It this subset for which we then calculate the 
expected value of (see Section f6. 311 . 

7.2 Magnitude independent uncertainty 

In ad dition to the statistic al uncertainty given for each data- 
point, |n^1o^^^ i2002l) also point out that there is a mag- 
nitude independent uncertainty for each datapoint, due to 
uncertainties in the profile correction. Essentially this is the 
uncertainty due to correcting the magnitude measurements 
back to the large apertures required for standard stars. This 
should be clearly distinguished from the error in the trans- 
formation from the natural to the standard system, which 
has the effect of shifting all the data points in the same 
direction. We use a magnitude independent uncertainty of 
0.01 mags for each filter (thus 0.01 mags in V and 0.014 
in V — /) as a good approxim ation to the magnitude inde- 
pendent uncertainty given bv iNavlor e t al. (200^). This is 
added in quadrature with the statistical uncertainties. As 
we shall see below, this value is also justified by the fact 
that we obtain a reasonable value for Pr(r^). For datasets 
where this is not the case, one has the possibility of adjust- 
ing the magnitude independent uncertainty until a Pr(r^) 
of approximately 50 percent is obtained. 



7 NGC2547 - A WORKED EXAMPLE 

An important test of any algorithm is whether it will work 
with real, as well as simulated data. We have chosen as 
our test dataset the X-ray selected sample of members of 
the young open cluster NGC2547, which we first fitted in 
[Navlor ct al. (20o3). We have chosen this cluster as the 
dataset has already been fitted by one of the authors using 
traditional "by eye" methods, allowing us to make a direct 
comparison of the methods. 



7.1 Soft clipping 

The main practical problem which must be solved is that 
some of the datapoints lie in regions of the CMD to which 
our model assigns probabilities (p(c, m)) of zero. Of course, 
in principle no point on the CMD has zero probability, once 



7.3 Extreme mass-ratio binaries 

A second issue is the absence from our models of extreme 
mass-ratio binaries. This was first pointed out in Section 
H and is due to the fact that the isochrones do not reach 
sufficiently low masses to allow us to model the most ex- 
treme binaries. In the simulations we have performed thus 
far this is not an issue as both the simulated data and the 
models suffer from the same cut-off. To simulate the case 
of real data we therefore created a new set of models where 
the lowest mass stars available for the binaries were O.25M0 
(compared with 0.017Mq in the isochrones). We then fit- 
ted these models to simulated datasets with the underlying 
parameters used in Section |3 which therefore contained bi- 
naries created using the full range of masses available in 
the isochrones. The mean parameters from 30 simulated ob- 
servations were 40.07±0.16Myr and a distance modulus of 
-0.002±0.001, where the uncertainties are standard errors. 



12 T. Naylor and R.D. Jeffries 



Thus the effect on the parameters of a low-mass cut-off for 
the binaries is undetectable in our simulation, and certainly 
an order-of-magnitude below our statistical uncertainties. 

7.4 Results 

As in iNavlor et all ||2002|) we used the models of DAM97, 
and extinctions of E{V - 7c)=0.077 and Ay = 0.186. We 
began by assuming 50 percent of the unresolved images are 
binaries, 50 percent single stars, and assumed (even when 
we changed the binary fraction) that the masses for the sec- 
ondary stars were uniformly distributed between the mass of 
the primary and the lowest mass available in the models. The 
best fit gives Pr(T^)=0.41 and is shown in Figure [T^ Con- 
sidering this is a new technique, finding an acceptable value 
of (equivalent to finding a reduced of approximately 
one) is very encouraging both in terms of the verisimili- 
tude of the models, and of the accuracy of our observations. 
The best fitting values and 67 percent confidence limits are 
38.5le;5Myrs and a true distance modulus of 7.79lo:o5- Al- 
though the fit is good, close examination shows that between 
V^=13.5 and 15 the model seems to lie show systematically 
below the data. First, it should be made clear that this effect 
is small (0.02 mags in V — I). Second, it might be thought 
that by decreasing the distance modulus one could fit these 
points, and fit the lower pre-main-sequence by choosing a 
slightly greater age. In fact the models show that the region 
at V = 14 is moving bluewards with age faster than the 
lower part of the sequence, and the test has chosen a rea- 
sonable compromise. The systematic residuals are, therefore, 
real differences between the shape of the model isochrones 
and observed sequences. 

7.5 Changing the binary fraction 

Although Figure lOI shows the that the single star sequence 
is broadly correct, it is harder to assess the fit to the binary 
stars. The distribution of for the individual datapoints 
gives us a useful insight into this. In Section 16.31 we de- 
scribed how we calculate the probability distribution of 
for each datapoint before multiplying them together to pre- 
dict the overall value for for the fit. Instead of multiplying 
them, the sum of the probability distributions gives us the 
expected distribution of for the datapoints in the best 
fit. Before carrying out a comparison with the NGC2547 
data, we show in Figure [T^ the comparison between the ac- 
tual (histogram) and predicted (curve) distributions of the 
single-point values for the simulated cluster used in Sec- 
tion|3 This shows the prediction works very well. In Figure 
1151 we show the same plot for NGC2547. The real distribu- 
tion differs from the model in the mid-ranges of r^, in partic- 
ular there are more points at ~ 4 than the model predicts. 
High values of correspond to low values of Pi . The major- 
ity of the low values of Pi will occur in the region between the 
single-star and equal-mass-binary sequences, implying that 
we have underestimated the binary fraction. To test this hy- 
pothesis, and to establish whether one must correctly model 
the binary fraction to determine reliable ages and distances, 
we re-fitted the data with a binary fraction of 80 percent. 
The actual and predicted distributions of shown in Figure 
1161 Increasing the binary fraction has indeed increased the 




Figure 14. The expected distribution of t"^ (curve) and that 
obtained from the data (histogram) for the simulated dataset of 
Section 13 




Figure 15. The expected distribution of (curve) and that 
obtained from the data (histogram) for NGC2547, assuming a 
binary fraction of 50 percent. 

number of high valued t'^ points, but in fact the model is 
now systematically lower than the data. Furthermore Pr('r^) 
is now only 0.14. Clearly a binary fraction of 50 percent is 
a better fit to the data than 80 percent. 

There is a strong temptation at this point to attempt to 
model the properties of the binaries, and indeed the ability 
to extract such information is one of our primary motiva- 
tions for developing r^. However, it clearly lies outside the 
scope of this introductory paper to do so. Furthermore, in 
this case the dataset itself is unsuited to such an experiment. 
Aside from the question as to whether an X-ray selected 
sample is biased towards binary stars, the reader should 
also note that some stars appear above even the equal-mass- 
binary sequence. Although some of these may be multiple 
systems with more than two members (which we have ig- 
nored in our models), there are three times more of them 
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Figure 13. The X-ray selected members of NGC2547 (green circles with error bars) and best fitting model (background image), for a 
binary fraction of 0.5. 




Figure 16. The expected distribution of (curve) and that 
obtained from the data (histogram) for NGC2547, assuming a 
binary fraction of 80 percent. 

than we might expect from lDuauennov fc Mavorl lll99 j) . For 
the majority of these objects, therefore, our result implies 
that there is a non-photospheric contribution to their lu- 
minosity, which again would not be surprising for an X-ray 
selected sample, or that we have significant contamination 
from foreground dwarfs. Either case would clearly preclude 
a photometric determination of binary parameters. Despite 
these cautions, it is interesting to note that we obtain a cred- 
ible value of Pr(T^) for a binary fracti on which is close to 
that determined bv lNavlor et al.1 tOQ'j) (60 percent), when 



they too assumed a flat mass ratio distribution. Equally im- 
portantly, with a binary fraction of 80 percent we obtained a 
distance modulus of 7.82 and an age of 37.5Myr, which is not 
significantly different from the result for a binary fraction of 
50 percent. The conclusion that the binary fraction has little 
effect on the derived parameters is, in retrospect, unsurpris- 
ing. It means that the fit is being driven by the single-star 
sequence, and not being dragged to brighter magnitudes by 
the binaries. 



7.6 Comparison with previous work 

Although it is easiest to quote our result in terms of single 
parameters and their uncertainties, the derived age and dis- 
tance are strongly correlated. This is summarised in our 
space in Figure [T71 Interestingly there is a second minimum 
(not as deep as the primary one) at 53Myr and a distance 
modulus of 7.63 mags. This is exactly the effect discussed 
in Section 17.41 and when the fit is examined, we find that 
the data at bright magnitudes lie systematically above the 
model. 

The age/distance- modulus pair of 25±5Myr 8.0 5 ±0.10 
derived from the V /V - Ic data in iNavlor et ai] ||2002|) 
clearly lie in the valley of Figure [T7I The p osition within 
that v alley cannot be directly compared with lNavlor et all 
i2002h as they used B/B — V data to constrain the dis- 
tance. Perhaps most remarkable is th e excellent agree- 
ment with the lithium depletion age of I Jeffri es fc Oliveiral 
(|20Q5J. The age derived from the lithium depletion bound- 
ary depends on the dist ance modulus. Using the data of 
iJeffries fc Oliveiral ^2005^ and the DAM97 models we derive 
an age of 37 ± 3Myr for our best-fit distance modulus of 7.8 
mags. However, we can also plot the constraint in Figure [T7I 
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AGE (MYR) 

Figure 17. The grid for fitting the NGC2547 X-ray members 
to a model with an 80 percent binary fraction. The minimum 
value of is 697.8 and the white contours are the 67 percent 
(r2=704.6) and 95 percent {t2=711.8) confidence limits. The ob- 
served lithium depletion boundary requires the age and distance 
to lie between the two green lines. 



which emphasises the concordance between the lithium and 
isochronal ages. Our error bars in distance just fail to over- 
lap at Iff with those fr om HIPPARCOS (8.18lJ5;26) given by 
iRobichon et al.l ilQQS/l. but the distances are clearly not in- 
consistent. Our conclusion is, therefore, that when used with 
real data fitting gives credible values and uncertainties. 



8 CONCLUSIONS 

We have developed a maximum likelihood method for deter- 
mining parameters for an isochronal population which con- 
tains binaries, from its colour-magnitude diagram. We have 
used numerical simulation to demonstrate it is correct, and 
used it on a practical example. There is clearly scope for 
further development. Most obviously one could search many 
more parameters than we have, determining, for example, 
binary fraction and mass ratio distribution, mass function, 
metallicity, or extinction. Several of these could be allowed 
to vary simultaneously. 

One could also use this as a search statistic, looking for 
populations of a given age in large are surveys. Here the 
absolute value of would measure how likely a given "se- 
quence" is to have occurred by chance. Furthermore, one 
could not only search an N-dimensional colour magnitude 
space, but might also wish to use other parameters, such 
as position on the sky modelled against a clustered distribu- 
tion. Finally there is also a range of other problems to which 
the technique might be applied such as mod elling mass seg- 
regat ion in the mass-radius diagram (e.g. iLittlefair et alJ 
l2003r . and one could even conceive of a replacement for the 
ID Kolmogorov-Smirnov test where the datapoints had as- 
sociated uncertainties. 
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